DOCUMENT RESUME 



ALOOl 812 



ED 027 540 

By'Robinsont Peter 

Basic Factors in the Choicet Composition and Adaption of Second Language Tests. 

Pub Date Mar 69 

Note' i ip.; Paper given at the third annua* TESOL Convention* Chicago* Illinois* M:^ch 1969. 

EDRS Price MF-S0.25 HC-$0 65 
Oescriptors'Aptitude Tests. Diagnostic Tests. *En^ish (Second Language). >tLanguage Tests. ^Second 
Language Learning *Test Construction *Test Selection 

Identifiers'Classification Tests. ^Evaluation Tests. Prediction Tests. Progress Tests 

Generally speaking, the main purposes of second language tests are survey, 
didactic, psychological or sociological research; and evaluation, the latter being the 
concern of this paper. Evaluation tests measure the knowledge the learner has of the 
second or foreign languages, and may be subdivided into the following categories: (1) 
aptitude tests, which assess a person’s capacity to learn another language; (2) 
diagnosis tests, which are either inventory." and attempt to make a complete list of 
what the student knows in the various areas of the spoken and written language, or 
"error." which seek to identify and explain specific student mistakes; (3) classification 
tests, which divide students up into various levels of language competence for the 
purpose of forming homoaeneous classes; (4) prediction tests, which are used to 
predict the student’s hancJing of the second or foreign language in specific social 
and work situations where the second or foreign language is the only language used; 
and (5) progress tests, which try to measure the student’s progress in a given 
program. Once the purpose of the test has been determined, the following stages fall 
into place-Hevel, type, selection, form, gradation, order, number of items, 
administration of test, correction, and validation. These points are discussed in turn 
and are followed by a listing of recent writings on testing in second languages. (AMM) 



"BmIo Ikotors in the Ghoic#f Composition and Adaption of Second lansuage Tests" 



* 



U.S. DEPAIfNENT OF KIAITH, DUCA 1 KM I WEIFAK 
OFFICE OF EDUaHON 



THIS DOCUMENT HAS KEN REPIODUCB) EXACUY AS RECEIVQ) FROM THE 
KIS0H0I0R6AMZATI0II0M6MATIII6IT. POinS OF VIEW OB OniMNS 
STATED DO NOT NECESSARILY RmSENT OFFICIAL OFFICE OF EDUaTWN 
POSITION OR POLICY. 



lUT Peter Bohinson 
Head, TESOL Programme 
Etudes Anglaises 
Eaoulte des Lettres 



(Paper given at the Third Annual 
Convention of TESOL 
ChioagOy Illinois 
Haroh 5-0» 1969) 



al 

er|c 



001 812 



The most important factors in the choice 9 composition and adaption of 
second language tests would seem to he the kind of test to usOf oral compre- 
hension test 9 reading comprehension test etc* 9 and what language items the 
test should contain; hut in fact these questions are of secondary importance 
and depend entirely on the purpose of the test and the kind of people the test 
sets out to measure* 

Generally speaking9 there are five main purposes: survey; didactic research; 
psychological research; sociological research; and finally evalviation with which 
this paper is concerned* 

Survey tests are used to gather information about the second language 
ooftpetence of various ethnic groups in a particular country where more than one 
language is currently spoken9 Belgium or Canada for example* Survey tests are 
of course not restricted to hllingual or multilingual situations 9 hut can he 
applied in countries where one language is current to measure the foreign language 
competence of various groups9 i*e*9 the oral French of secondary school-children 
in England* 

Didactic research is concerned with the effectiveness of different teaching 
techniques9 different manuals9 prog7rammes9 audio-visual alds9 even with the assess- 
ment of teaching competence* Tests are used to show that for example a particular 
teaching technique is more effective than another* 

Psychological tests are concerned with the way a person learns andther language 
and with the way the acquisition of the new language affects his mother tongue and 
his personality* 

Sociological tests cover more or less the same area as psychological tests 9 
hut at the level of the groupp not of the individual* The whole question of 
contcu^t and conflict between groups speaking different languages is examined* 

Evaluation tests subdivide into five main categories: aptitude 9 diagnosis 9 
classificationt prediction and progress* They are concerned with measuring the 
knowledge the learner has of the second or foreign language* 

The first of these five categories of evaluation tests 9 aptitude 9 the object 
of which is to assess a person’s capacity to learn another language9 can he of 
great help to the teacher in giving him some idea of how far and how fast a 
certain prospective student may progress and what kind of help he may need* A 
distinction must he made here between the general aptitude test just described 
and the limited aptitude test udiich deals with the student’s capacity to learn 



er|c 



a certain language. The Utter teat givea no Indication of hia aptitude at 
aU. hut simply identifies the kind of prohlema that the student wUl meet In 



learning that la ng uage# 

Diagnosis tests hreak doim Uto two sub-oateGories: inventory and error. 

The inventory category attempts to make as complete a list as possible of what 
the student knows In the various areas of the spoken and written language, whUe 
the error category seeks to identify and to explain specific student mistakes. 

Classification tests divide students up Into various levels of language 
competence for the purpose of forming homogeneous classes. These levels and 
their sub-divisions, beginning level 1, 2 and 5, UtemedUte level 1, 2 and 5, 
advanced level 1 , 2 and 3 are arbitrary levels irtiich are more or less clearly 

defined hy the teacher and the programme director. 

Prediction tests are used to predict the student's handling of the second 

or foreign language in specific social and ^^thU kS’S Test 

foreign language is the only language used. A good P 

is the test In EngUsh for foreign students applying for admission to 

an English-speaking university. The test selects a certain number of students who 
are thereby supposed to have the minimum competence In English required to begin 
their studies at the diversity. 

Progress tests are tests that try to measure the student's progress in a 
given programme. There are two kinds of progress tests, the overall progress test 
and the Interim progress test. The former measures the student's overall progress 
from the beginning to the end of the course, whereas the latter deals with the 
extent to which the student has learnt the material of one or more lessons. 

Once the purpose of the test has been determined, the following stages faU 
into place: levelj typej selection? form? gradation? order? number of items? 

administration of test; correction; and validation. 

After purpose, level is the most critical stage, since it determines the type 

of test, the language items to be included In the test and the form the test nilL 
take. Level is simply the amount of Bn^ish the test assumes that the student 
should know to meet the requirements of one or more situations. The foUowing two 
examples, admission tests for foreign students applying for admission to an 
Engliah-apeaklng university and progress tests in a given course will illustra 







- 3 - 



what level means in practice# 

In the first example 9 admission tests 9 the level is defined in terms of 
the folloeliig situations: attendance at lectures and sanlnars; ainount of reading 
required! number of sritten assignments. The level eill be the minimum amount of 
ih>g.noh required to function efficiently and adequately in those situations. 

As regards the second example, progress tests, the level for the interim 
tests is the language content of the manual used in class.iAUe the level for the 
overall tests is what the teacher and the programme director think the beginning. 

Intermediate and advanced student should know. 

The type of test to be used is entirely dependent upon the level. For uni- 
versity admission tests, oral coiflprehenalon, oral expression, reading oomppehension, 
and fo mr«aiti nn tests cover the language skills in which the foreign student most 
possess a certain minimum competence in order to carry out his studies. In act 
practice, only oral cosqirehension and reading comprehension tests are used, as it 
is extremely difficult to make a rapid, consistent assessment of the student's 
ability to write and to speak. As regards progress tests, the type of tests will 

"be determined “by what has been tau^t in class# 

Selection, namely rt»t to Include in the test, is directly reUted to the 
level. In the case of university admission tests, selection is made in a series 
of stages: the first matter to settle is irtiether to select material from the 
undergraduate or graduate levels} the next question is to decide which lectures and 
aeminara to record} then a list of vocabulary, grammar and phonetic items is drawn 
up from the recordings} next, a certain number of these items in a certain pro- 
portion are selected for Inclusion In the test} finally, of the items selected 
for Inclusion in the test a certain number are chosen to directly assess the stu- 
dent's competence. The procedure is obviously not so lengthy and complex as 

regards progress tests* 

As regards the fori of the test, two basic decisions have to be made: objective 
or non-objeotive form for the student's answers} particular variant of a test type, 
i.e., vocabulary, grammar, phonetic, or semantic oral comprehension test, any one 

or any combixiatioxi of the above# 

Objective test or form is a misnomer9 as it gives the misleading impression 
that the test so described is an independent, detached, eminently reliable, 
scientific evaluation# In fact the objective test is not intrinsicly more 



er|c 



■ 4 - 



reliable than the non-ohjective type. The difference between the two is no more 
than a question of prooedure. In the non-objeotive fom, the student, in response 
to a series of questions, makes a free, active use of the second language, whereas, 
in the objective form, the answer is already given, and all he has to do is indicate 
by a mark idiioh answer is mors appropriate out of the four answers that appear with 

eclch question* 

The oharaoterietio feature therefore of the objective form is its limitation 
of the student’s participation and choice to selecting the ri^t answer out of 
four given possible answers* 

In mur oases, it is really only a choice between two possible answers, as 
the other two are so obviously wrong for the Intermediate student that he can 
...ny narrow the choice down to two, and thereby have a 5^ chance of selecting 
the ri^t answer by a sl»ple guess. This can largely defeat the purpose of having 
four answers per question to reduce the chance factors and make Interpretation of 
the results erOremely hazardous* 

While some students are helped towards the right answer, other students* 
attention is distracted: they concentrate on the irrelevant answers and end up 
either by seleoting a wrong one or by wasting too much time in finding the rif^it 
one. They either do not finish the test or have to rush throu«di certain parts of 
the test in order to complete it. Once again interpretation of the results is ex- 
tremely hazardous* 

The objective form, or multiple choice as it is sometimes called, with its 
four answers, one ri^t, the other three completely wrong, does not discriminate 
between different levels of language competence. The student is not faced with 
a real choice, between four truly possible solutions, which, considered separately, 
are all equally correct, but which considered together, sort themselves out in 
order of probabUity as right solutions. When the choice is not real, when the 
possible answers are not scored according to their degree of appropriateness as 
the right solution, the test may fail to distinguish between the student who 

knows nothing, who knows something and who knows a great deal. 

Choice between the correct answer and typical errors made by the student 
is a valid procedure for groups who have a known, particular error pattern. Bub 
obviously Spaniards and Frenchmen do not make the same kind of errors in learning 



er|c 



En^ish, and a test effective with the Spanish group would be useless with the 
French, or any other different national group, or with a group made up of people 
of different nationalities, as is the case with university admission tests. 

The irrelevant alternative answers of the objective test can take on a 
surprising relevance for particular national groups. The following example 
taken ftom a vocabulary test given to 6l8 French Canadian first year education 
students at Laval in 1968 illustrates this well. It is a question of choosing 
the rifi^it synonym for revise. 

change 355^ 

see 43^ 

paint 2^ 

learn 22^ 

It is clear that one distractor was useless (paint) and this narrowed the choice 
down to three. The preference for see arises probably out of association with 
reviser in the mother tongue, which, unlike its English cognate, does not have 
the of ehange. but only to look at again with the possibility of modif- 

ication. There is also in French close association both in usage and in origin 
between re voir a nd reviser , tieam may come from association with revision 
exercises, or more simply from students reading too much into the question, or 
even from students being unable to make up their mind between change and see. 

Perhaps, the most pertinent criticism that could be made against the multiple 
choice objective form is that it attempts to evaluate the student's competence in 
a particular language skill in such a passive way, and on his performance in a 
very limited area covered by a very small number of questions. 

As the studant's language competence is assessed within the very limited 
range of a determined number of questions, 30-100 normally as regards any language 
skill, it is hi^y important that the range covered by the questions reflects as 
accurately as possible the situation in which the student uses the language. The 
particular variant of a type of test has to be chosen with this in mind. It would 
seem as regards university admission tests that oral comprehension tests that 
concern themselves with the student's ability to distinguish between certain sounds 
to recognize certain grammatical formq»,to Recall names and numbers, to know the 
meaning of individual wprds, rathe* than to grasp the general meaning of one 
or two sentences, would be trying to predict the student's performance in 



1 






the lecture ty insignificant and inappropriate criteria. A student* s ability to 
consistently distinguish between b and 2 may not be crucial to his understanding 

of a lecture. 

Gradation, the grading of the difficulty of the questions in the test, and 
order, the sequence in which those questions appear, is not a pure linguistic 
exercise. Gradation and order are also determined by the make up of the group 
and by the situation in which the group has and/or will use the second language. 
There is no standard system of gradation applicable to all groups and all tests, 
but simply one which is valid for a particular group. The gradation for an oral 
comprehension test for absolute beginners who have done 50 hours of English wUl 
evidently not be the same for an oral comprehension test administered to foreign 
students applying for admission to an BJnglish-speakilig university. 

Involved with gradation and at the same time with administration, namely, the 
conditions under which the students take the test, are a series of factors. The 
first series of factors, for want of a better term, can be designated as presen- 
tation factors, that is the way the content of the test is presented to the 
student, for example, whether the test content is presented orally or in a written 
form} if oral, whether a tape recorder is used} if oral, whether in the form of 
a dialogue} if a dialogue, duration and number of dialogues etc. ... The second 
group of factors can be called student participation factors, namely, the way the 
student indicates his answers to the test, for example, in writing or orally) if in 
writing, whether he writes complete sentences, fdiether he fills in missing words, 
whether he enters a mark in a box, etc. ... Lastly, there are what might be called 
locale factors, that is the kind of place in which the test is given. 

Correction is more than ^t tabulating the scores. Correction is the 
quantitative assessment of the Importance the author of the test attributes to 
each question axid to each answer* 

Stsmdardlzation or normalization is the final and most critical stage in the 
composing of tests, since the whole usefulness of the test is assessed. Unfortu- 
nately, a statistically satisfjring picture of scores plotted evenly along a normal 
curve is no guarantee of the test*s linguistic usefulness. * The score distribu- 
tion is purely a result of the composition of the group, and varies from group to 
group. In fact a normalized test is no more than a test which produces the same 




resiilts with similar groups; whether these results meaui aiiTthing linguistically 
is another matter* 

While statistical profiles of tests are in no way an indication as to the 
test's worth as a test, they are essential in providing the necessary data on 
which to base the evaluation of the test's fulfilment of its goal* 

For objective testsy and this holds good for the non-objeotive type, 
normalization procedure is basically a detailedy statistical analysis of the 
students' answers* The first analysis with the whole group involves noting down 
the number and percentage of s undents that chose eeuih alternative answer y and 
then lists showing the different seleotlons are drawn up* The second analysis is 
identical to the first y but this time the group is no longer treated as a whole but 
it is divided up into three or four sub-groups according to the mark in the testy 
for example y students with a mark between 0 and 30 are classified as weak; those 
with a mark between 30 and 80 are intermediate; and those between 80 and 100 are 
strong* The purpose of such an arbitrary division is to see whether the choice of 
each group follows a consistent pattern and whether the pattern differs consider- 
ably from the pattern for the whole group* The third analysis is a detailed com- 
parison between the performance of similar groups on the same test* In this way 
it is possible to single out those questions which need to be revisedt questions 
mfi^ be too eae^ even for the beginning groupy while y in another quest lony even 
the strong group may find it too difficult; or again certain questions will appear 
to be ambigttousy asy each time the test is takeny cimilar groups of students vary 
considerably in their answers y while being consistent w.^.th other questions* 

Statistics provide the means to Isolate and measure the variations in the stu- 
dents' choice of answers* However y it is up to the author of the testy the linguist y 
the teaohery the programme director to explain these variations* 

The writing of a test and the use of a test require that a oertainy fundamental 
procedure be followed in order that useful results be achieved: purpose of test; 
type of group to be tested; level of English of group to be tested; type of evalua- 
tion test to be used; language skills to be tested; selection of test content; form 
of test; gradation; order; administration; correotion; normalization* It has been 
clear that great oare has to be exercised with the objective formy 8U3 it may produce 
a test that means nothing* One danger is that it may distract the student's 



o 



- 8 - 



attentlon from the essential point | another danger is that it nay nake it easier 
for some students to guess the ri^t answer | another point is that the student 
does not exercise any real choioei even a greater drawbaoky and. perhaps the most 
Important one is that the student’s actiye participation is zero and that his use 
of the language is neither seen nor heard; and finally his use of the language is 
surmised from such little evidence* 

Of course this does not mean that the objective fom always produces unreliable 
restate y but sinply there are built-in defects* The following variants in the 
objective form may go some way in dealing with the Inherent problems s 

1) of the four alternatives a-ii are possibloy but one is clearly the best* 

2) as above y but the other three alternatives ere scored according to their 

degree of possibility* 

3) only two alternatives are offered with the following modification - 

1- both are xdgjit* 

2- both are wrong* 

3- only the first one is right* 

4- only the second one is right* 

4) the rl^t answer is contrasted with typical errors made by a known group 

of students* 

Beyond the special problems posed by the objective form are the basic questions 
of how to reeaise these simple purposess know the group fdilch is going to be tested; 
know the language skill which is to be measured; know the situation in which the 
language is to be used; and know the test fdilch best and most quickly suits 

the groupy the language skill and the situation* The realization of these apparently 
simple goals is made all the harder by the fant that each test is only valid for the 
group it was designed to measure y and that all levels of language competence are 
relative y arbitrarily fixed to meet some situation by the teacher y the programme 
directory the university admissions board etc* Consequentlyy the scores are not 
absolute ajnd have no meaning outside the cont^t for which they were made* Tests 
could only lose their arbltraryy relative qualltyy if it were possible to definey 
and this is rather utoplany what the language competence of the average mother 
tongue speaker was; and then the second language student's use of the language could 
be measured against the fixed standard of the native speaker* 

Buty even if this utopian venture were possible with the help of all the socio- 
linguists y it must be remembered that the mother tongue speedcer is not equally 



coDvetent la all situations. Hsnce it follows that the student’s second language 
oompetenoe would he assessed la terns of the mother speaker's porofioienoy la only 
one situation, and his perfomaaoe la other situations would have to he inferred 
from the mother tongue speaker's perfomaaoe la those situations. 

The question must now he raised rtiether it is in fact possihle to have a 
general proficiency test irtiioh is valid for all groups, itotever the situation 
where the second language is used, irtiatever the social cultural hack»»und of the 

student nay 

Some tests appear to he effective for a fair proportion of students in a 
given situation, thouidi the social cultural background of the students is consider- 
ably varied. A good example of this is provided hy some university admission tests 
in finish for foreign students. Leaving aside the question i*ether the students 
so selected do well in their studies and are better than those not admitted, and 
accepting the premise that the tests are effective, it remains to he shown irtiat it is 
that nakes the tests so effective* 



****************** 



B i 1) 1 i o g r a p h y 



Frederick Barton Davis 
Hebert L* EHE)el 

Robert Lado 
George Perren 



G«D* Pickett 

Henri Pi4ron 
Theodore H« Plaister 
Andrd 

Bernard Spolsky 
Anne 0« Stemmier 

John A* Upshur & 
Julia Fata (Eds*) 

John A* Upshur 
Rebecca Valette 




Educational measurements and their interpretation* 
Belmont, Ceilifomiaa Wadsworth, 19^4# 422 p* 

Measuring educational achievement* 

Englewood Cliffs, New Jersey: Prentice-Hall, 

1965» 481 p* 

Language testing* 

London: Longmans, 19^1 > 389 P* 

Testing ability in English as a second language* 

Part I* ftpoblems* 

EngHah Language Teaching, 21* 2(1987)# P* 99—186* 

Testing ability in English as a second language* 

Part 2* Techniques* / , x 

BngHflh Language Teaching, 21j 3(198?)# P* 197-202* 

Testing ability in English as a second language* 

Part 3* Spoken language* English Language 
Teaching, 2^ I(1987)» P* 22-29# 

A comparison of translation and blank— filling as 
testing techniqpies* En^ish Language Teaching, 

ii# 1(1988), p* 21-28* 

Examens et docimologie* 

Paris: Presses universitaires de Prance, 1983# 190 p* 

Testing comprehension: a culture fair approach* 

TESOL Quarterly, 1^ 3(l98?)# P* 17-19# 

Connaissance de l*individu par les tests* 

Bruxelles: Charles Dessart, 1988, 224 P# 

Language testin^the problem of validation* 

TESOL Quarterly, 2|, 2(1988), p* 88-94# 

The LCT, language— coalition test (research edition) — 
a test for educationally disadvantaged school beginners. 
TESOL Quarterly, 1, 4(198?)# P# 35-43# 

Problems in foreign language testing* Proceedings 
of a conference held at the University of Michigan# 
September I987* Language Learning, Special Issue, 

No* 3# August 1988* 

Testing foreign-language function in children* 

TESOL Quarterly, 1, 4(198?)# P# 31-34# 

Modem language testing: a handbook* 

New York: Harcoirt# Brace & World, 198?, 200 p* 




